Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
-
We propose a new data structure called CachedEmbeddings for training large scale deep learning recommendation models (DLRM) efficiently on heterogeneous (DRAM + non-volatile) memory platforms. CachedEmbeddings implements an implicit software-managed cache and data movement optimization that is integrated with the Julia programming framework to optimize the implementation of large scale DLRM implementations with multiple sparse embedded tables operations. In particular we show an implementation that is 1.4X to 2X better than the best known Intel CPU based implementations on state-of-the-art DLRM benchmarks on a real heterogeneous memory platform from Intel, and 1.32X to 1.45X improvement over Intel’s 2LM implementation that treats the DRAM as a hardware managed cache.more » « less
-
null (Ed.)Non-volatile memory (NVRAM) based on phase-change memory (such as Optane DC Persistent Memory Module) is making its way into Intel servers to address the needs of emerging applications that have a huge memory footprint. These systems have both DRAM and NVRAM on the same memory channel with the smaller capacity DRAM serving as a cache to the larger capacity NVRAM in the so called 2LM mode. In this work we analyze the performance of such DRAM caches on real hardware using a broad range of synthetic and real-world benchmarks. We identify three key limitations of DRAM caches in these emerging systems which prevent large-scale, bandwidth bound applications from taking full advantage of NVRAM read and write bandwidth. We show that software based techniques are necessary for orchestrating the data movement between DRAM and PMM for such workloads to take full advantage of these new heterogeneous memory systems.more » « less
-
Acidification of the ocean due to high atmospheric CO 2 levels may increase the resilience of diatoms causing dramatic shifts in abiotic and biotic cycles with lasting implications on marine ecosystems. Here, we report a potential bioindicator of a shift in the resilience of a coastal and centric model diatom Thalassiosira pseudonana under elevated CO 2 . Specifically, we have discovered, through EGFP-tagging, a plastid membrane localized putative Na + (K + )/H + antiporter that is significantly upregulated at >800 ppm CO 2 , with a potentially important role in maintaining pH homeostasis. Notably, transcript abundance of this antiporter gene was relatively low and constant over the diel cycle under contemporary CO 2 conditions. In future acidified oceanic conditions, dramatic oscillation with >10-fold change between nighttime (high) and daytime (low) transcript abundances of the antiporter was associated with increased resilience of T. pseudonana . By analyzing metatranscriptomic data from the Tara Oceans project, we demonstrate that phylogenetically diverse diatoms express homologs of this antiporter across the globe. We propose that the differential between night- and daytime transcript levels of the antiporter could serve as a bioindicator of a shift in the resilience of diatoms in response to high CO 2 conditions in marine environments.more » « less
-
Memory capacity is a key bottleneck for training large scale neural networks. Intel® Optane DC PMM (persistent memory modules) which are available as NVDIMMs are a disruptive technology that promises significantly higher read bandwidth than traditional SSDs at a lower cost per bit than traditional DRAM. In this work we show how to take advantage of this new memory technology to minimize the amount of DRAM required without compromising performance significantly. Specifically, we take advantage of the static nature of the underlying computational graphs in deep neural network applications to develop a profile guided optimization based on Integer Linear Programming (ILP) called AutoTM to optimally assign and move live tensors to either DRAM or NVDIMMs. Our approach can replace 50% to 80% of a system's DRAM with PMM while only losing a geometric mean 27.7% performance. This is a significant improvement over first-touch NUMA, which loses 71.9% of performance. The proposed ILP based synchronous scheduling technique also provides 2x performance over using DRAM as a hardware-controlled cache for very large networks.more » « less
An official website of the United States government

Full Text Available